An approach to analysis of arabic text documents into text lines, words, and characters

نویسندگان

چکیده

Text line extraction from a text document image and segmenting it into isolate words these individual characters are considered as one of the most critical processes in OCR systems development turning searchable electronic representation, this paper presents new approach to analyze Arabic documents, proposed contains four steps, preprocessing, segmentation, word character segmentation. The horizontal projection method used detect extract preprocessed documents image, segmentation step space threshold computed determine spaces among connected components within-word or between-words for words, finally thinning applied find skeleton segmented analyses geometric characteristics ligatures characters. was tested evaluated on set 115 images, images KFUPM Handwritten TexT (KHATT) database some produced by authors. experiment results extremely encouraging, with success rate 98.6% lines 96% 87.1%

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Segmenting Arabic Handwritten Documents into Text lines and Words

In this paper, we present a method for segmenting Arabic handwritten documents into text lines and words. Text line segmentation is addressed by a well-known technique, the horizontal projection profile, in which autocorrelation is used to enhance the self similarity of this profile. This technique promotes the estimation of text line spacing. Word extraction is based on an adaptation of a know...

متن کامل

Handwritten document image segmentation into text lines and words

Article history: Received 22 July 2008 Received in revised form 23 February 2009 Accepted 14 May 2009

متن کامل

eplicitation in interlingual and intralingual translations of shahnameh ferdowsi: a text linguistic approach

بررسی و مقایسه تفاوتها و شباهت های ترجمه ی درون زبانی و برون زبانی با تمرکز بر زبانشناسی متن. برای امر مقایسه میزان بسامد تصریح به کار رفته در ترجمه ی درون زبانی و نیز برون زبانی شاهنامه ی فردوسی مورد بررسی قرار گرفت.

Competitive Intelligence Text Mining: Words Speak

Competitive intelligence (CI) has become one of the major subjects for researchers in recent years. The present research is aimed to achieve a part of the CI by investigating the scientific articles on this field through text mining in three interrelated steps. In the first step, a total of 1143 articles released between 1987 and 2016 were selected by searching the phrase "competitive intellige...

متن کامل

Arabic text summarization based on latent semantic analysis to enhance arabic documents clustering

Arabic Documents Clustering is an important task for obtaining good results with the traditional Information Retrieval (IR) systems especially with the rapid growth of the number of online documents present in Arabic language. Documents clustering aim to automatically group similar documents in one cluster using different similarity/distance measures. This task is often affected by the document...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Indonesian Journal of Electrical Engineering and Computer Science

سال: 2022

ISSN: ['2502-4752', '2502-4760']

DOI: https://doi.org/10.11591/ijeecs.v26.i2.pp754-763